我们提出了一个新颖的框架,用于设计无乘数内核机器,该机器可以在智能边缘设备等资源约束平台上使用。该框架使用基于边缘传播(MP)技术的分段线性(PWL)近似值,仅使用加法/减法,移位,比较和寄存器底流/溢出操作。我们建议使用针对现场可编程门阵列(FPGA)平台进行优化的基于硬件的MP推理和在线培训算法。我们的FPGA实施消除了对DSP单元的需求,并减少了LUT的数量。通过重复使用相同的硬件进行推理和培训,我们表明该平台可以克服由MP近似产生的分类错误和本地最小值。该提议的无乘数MP-Kernel机器在FPGA上的实施导致估计的能源消耗为13.4 PJ,功率消耗为107 MW,每台均具有〜9K LUTS和FFS,每张均具有256 x 32个大小的核与其他可比实现相比,区域和区域。
translated by 谷歌翻译
最近对基于细粒的基于草图的图像检索(FG-SBIR)的重点已转向将模型概括为新类别,而没有任何培训数据。但是,在现实世界中,经过训练的FG-SBIR模型通常应用于新类别和不同的人类素描器,即不同的绘图样式。尽管这使概括问题复杂化,但幸运的是,通常可以使用一些示例,从而使模型适应新的类别/样式。在本文中,我们提供了一种新颖的视角 - 我们没有要求使用概括的模型,而是提倡快速适应的模型,在测试过程中只有很少的样本(以几种方式)。为了解决这个新问题,我们介绍了一种基于几个关键修改的基于新型的模型 - 静态元学习(MAML)框架:(1)作为基于边缘的对比度损失的检索任务,我们简化了内部循环中的MAML训练使其更稳定和易于处理。 (2)我们的对比度损失的边距也通过其余模型进行了元学习。 (3)在外循环中引入了另外三个正规化损失,以使元学习的FG-SBIR模型对类别/样式适应更有效。在公共数据集上进行的广泛实验表明,基于概括和基于零射的方法的增益很大,还有一些强大的射击基线。
translated by 谷歌翻译
In this paper, we extend scene understanding to include that of human sketch. The result is a complete trilogy of scene representation from three diverse and complementary {modalities} -- sketch, photo, and text. Instead of learning a rigid three-way embedding and be done with it, we focus on learning a flexible joint embedding that fully supports the ``optionality" that this complementarity brings. Our embedding supports optionality on two axis: (i) optionality across modalities -- use any combination of modalities as query for downstream tasks like retrieval, (ii) optionality across tasks -- simultaneously utilising the embedding for either discriminative (e.g., retrieval) or generative tasks (e.g., captioning). This provides flexibility to end-users by exploiting the best of each modality, therefore serving the very purpose behind our proposal of a trilogy at the first place. First, a combination of information-bottleneck and conditional invertible neural networks disentangle the modality-specific component from modality-agnostic in sketch, photo, and text. Second, the modality-agnostic instances from sketch, photo, and text are synergised using a modified cross-attention. Once learned, we show our embedding can accommodate a multi-facet of scene-related tasks, including those enabled for the first time by the inclusion of sketch, all without any task-specific modifications.
translated by 谷歌翻译
我们使用徒手场景草图FS-Coco的第一个数据集将草图研究推向了场景。考虑到实用的应用,我们收集的草图很好地传达了场景内容,但可以在几分钟之内由具有素描技巧的人勾勒出来。我们的数据集包含10,000个徒手场景向量素描,每点时空信息由100个非专家个人提供,提供对象和场景级抽象。每个草图都用文本描述增强。使用我们的数据集,我们首次研究了徒手场景草图和草图标题的细粒度图像检索问题。我们了解以下内容:(i)使用笔触的时间顺序在草图中编码的场景显着性; (ii)从场景草图和图像标题中进行图像检索的性能比较; (iii)素描和图像标题中信息的互补性,以及结合两种方式的潜在优势。此外,我们扩展了一个流行的矢量草图基于LSTM的编码器,以处理比以前的工作所支持的更复杂性的草图。也就是说,我们提出了一个层次草图解码器,我们将其在特定于草图的“预文本”任务中利用。我们的数据集可以首次研究徒手场景素描理解及其实际应用。
translated by 谷歌翻译
深度学习模型在各种自然语言处理任务中设置了基准。然而,这些模型需要巨大的培训数据,这在许多实际问题中是不可行的。虽然各种技术如域适应,但是几个学习技术解决了这个问题,我们介绍了一种积极地将外部知识的新技术引入学习以解决低数据制度问题。我们提出了一种称为Actknow的技术,它基于知识图(KG)的“按需”在学习中,激发了知识图表(KG)的知识(QA)。通过从概念网络中注入世界知识,我们对基于文本的基于文本的变压器模型的临时挑战 - 在低数据制度中的变压器模型上显示了显着的改进。例如,通过仅使用20%的训练示例,我们分别证明了弧形挑战和OpenBookQA的准确性提高了4%。
translated by 谷歌翻译
在本文中,我们展示了一种基于语料库的治疗两种语言 - 英语和印地语。它研究了印地语和英语翻译并行语料库中的礼貌,看到了印地语文本中的礼貌被翻译成英文。我们提供了一个详细的理论背景,其中进行了比较,然后进行了本理论模型中的翻译数据的简要描述。由于礼貌可能成为冲突和误解的主要原因之一,因此是一种非常重要的现象,以跨文化而被研究和理解,特别是因为机器翻译的这种目的。
translated by 谷歌翻译
本文提出了创造和管理12个主要印度语言的大型并行语言(即将扩展到23种语言)的挑战,作为由信息技术部(DIT),政府部门资助的主要财团项目的一部分。印度,并在印度的10所不同大学中平行运行。为了有效地管理这些巨大的Corpora的创建和传播过程,基于Web的(具有减少的独立版本)的注释工具ILCiann(印度语言语料集团倡议注释工具)已经开发出来。它主要是为POS注释制定的,以及由具有不同竞争力和物理位于相距远的地点的人员的管理器的管理。为了维持在创建Corpora中的一致性和标准,有必要每个人都在这个工具提供的共同平台上。
translated by 谷歌翻译
Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.
translated by 谷歌翻译
Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand. However, the structured nature of the learning problem (free-form text query inputs, localized video temporal window outputs) and its needle-in-a-haystack nature makes it both technically challenging and expensive to supervise. We introduce Narrations-as-Queries (NaQ), a data augmentation strategy that transforms standard video-text narrations into training data for a video query localization model. Validating our idea on the Ego4D benchmark, we find it has tremendous impact in practice. NaQ improves multiple top models by substantial margins (even doubling their accuracy), and yields the very best results to date on the Ego4D NLQ challenge, soundly outperforming all challenge winners in the CVPR and ECCV 2022 competitions and topping the current public leaderboard. Beyond achieving the state-of-the-art for NLQ, we also demonstrate unique properties of our approach such as gains on long-tail object queries, and the ability to perform zero-shot and few-shot NLQ.
translated by 谷歌翻译
Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the development of bilingual SMT models for translating English to fifteen low-resource Indian Languages (ILs) and vice versa. At the outset, all 15 languages are briefed with a short description related to our experimental need. Further, a detailed analysis of Samanantar and OPUS dataset for model building, along with standard benchmark dataset (Flores-200) for fine-tuning and testing, is done as a part of our experiment. Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. To create the system, MOSES open-source SMT toolkit is explored. Distance reordering is utilized with the aim to understand the rules of grammar and context-dependent adjustments through a phrase reordering categorization framework. In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES
translated by 谷歌翻译